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ABSTRACT 

Design of antisense oligonucleotides targeting any 
mRNA can be much more efficient when several 
activity-enhancing motifs are included and activity- 
decreasing motifs are avoided. This conclusion was 
made after statistical analysis of data collected from 
>1O00 experiments with phosphorothioate-modified 
oligonucleotides. Highly significant positive correlation 
between the presence of motifs CCAC, TCCC, ACTC, 
GCCA and CTCT in the oligonucleotide and its anti- 
sense efficiency was demonstrated. In addition, 
negative correlation was revealed for the motifs 
GGGG, ACTG, AAA and TAA. It was found that the 
likelihood of activity of an oligonucleotide against a 
desired mRNA target is sequence motif content 
dependent. 

INTRODUCTION 

Antisense oligonucleotides are gaining importance in the rapid 
analysis of the effects of lowered expression in tissue culture, 
of increasing numbers of sequenced genes, and the allure of 
possible therapeutic utility is being actively investigated. One 
important question in antisense technology is the identification 
of mRNA sites that can be targeted efficiently. Many experiments 
have shown that certain antisense oligonucleotides are more 
active than others in suppressing specific gene expression. A 
routine approach to find the most active antisense oligonucleo- 
tides involves synthesis of numerous complementary oligo- 
nucleotides (up to several dozen) for different regions of the 
targeted mRNA, followed by activity screening in cells (1-12). 
Strategies reducing the number of antisense oligonucleotides 
for intracellular tests should have significant benefit. 

The calculated Gibbs free energy (AG° 37 ) values for duplex 
formation between an oligonucleotide and mRNA molecules 
correlates with oligonucleotide antisense activity (13), though 
hybridization affinity alone is not sufficient to ensure antisense 
oligonucleotide efficiency in cells (3). Systematic alignment of 
computer-predicted local RNA secondary structures proved to 
be an improvement over trial and error screening in selecting 
antisense oligonucleotides for inhibition of expression of the 



gene for intracellular adhesion molecules (14). Considerations of 
the predicted stabilities of antisense oligonucleotide-target-RNA 
duplexes and their competition with predicted secondary 
structures of both the targets and antisense oligonucleotides 
may also be valuable for antisense research (13,15). 

Another theoretical strategy for identification of efficient 
antisense oligonucleotides arises from the finding that the 
motif 'TCCC is over represented amongst the most active 
oligos compared to their inactive counterparts. This finding 
was made during analysis of published oligonucleotide 
sequences and in prospective experiments with TNF-a mRNA 
where oligos containing the TCCC motif had a much higher 
success rate (50%) than oligonucleotides selected by trial and 
error (6%) (16). The correlation between occurrence of subse- 
quence motifs in oligonucleotides is further explored here 
using a database of antisense molecules from previously 
published experiments. 

MATERIALS AND METHODS 

Database 

Two selection criteria were used for choosing publications 
from which to extract oligonucleotide sequences for inclusion 
in the database. First, activity of oligonucleotides must have 
been measured by assays that evaluated the cellular level of 
antisense effect on a specific mRNA or its protein product. 
Second, at least ten different oligonucleotides targeting the 
same mRNA had to be tested under identical experimental 
conditions. The resulting database contains the names of 
targeted mRNAs (genes), oligonucleotide sequences, data on 
their antisense activities (expressed as the ratio of levels of 
particular mRNA or protein measured in cells after treatment 
with experimental antisense versus control oligonucleotide) 
and literature references. The database is on the Web (http:// 
antisense.genetics.utah.edu ) and is described in more detail in 
a separate publication (17). Unlike previous work of this kind 
(18), we analyzed oligonucleotides that target different parts of 
mRNA rather than molecules that are complementary only to 
mRNA translation initiation regions. This approach permits 
independence from any motif bias related to initiation 
sequences. 
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Statistical analysis 

For the database analysis, the program Oligostat (A.Tsodikov 
and O.Matveeva, manuscript in preparation, with the program 
available upon request) was created and used in combination 
with Excel (Microsoft, Inc.). Oligostat calculates the correlation 
coefficients (f-test) (19) of oligo activity and motif occurrence. 
In this work 'motif is used for a continuous stretch of 3 or 4 nt, 
so that the sequence -GCCACTCT- contains six 3- and five 4-nt 
motifs. Correlation analysis (f-test) was chosen rather than the 
chi-square test as it avoids defining an arbitrary cut-off point in 
classifying oligonucleotides as active or inactive by utilizing 
continuous activity input. Using motifs with statistically 
significant correlation coefficient values (P <0.05\ Oligostat 
allows the user to create a logistic regression model (19) that 
relates the probability of activity with the motif content of each 
oligonucleotide. In addition, Oligostat performs a likelihood 
ratio test (19) for selection of motifs that are most significant 
for the model. To find motifs that are less dependent on potentially 
biased or inaccurate information, which might be present in 
any published work, 'minus one mRN A' verification was used. 
In this procedure, oligo subsets that target each mRNA were 
removed, in turn, from the database and the remaining parts of 
the database were analyzed to find motifs with significant 
correlation coefficients. 

To determine the extent and reproducibility of the effect of 
the presence of certain motifs in increasing or decreasing the 
proportion of active oligonucleotides, three sets of data were 
used. The results obtained with each set were then compared. 
The first set utilized the oligonucleotides studied in the 
screening experiments published by ISIS Pharmaceuticals 
(Carlsbad, CA) (147 oligonucleotides). The second set utilized 
data of experiments reported by other investigators under more 
heterogeneous conditions (202 oligonucleotides). The third set 
utilized the data from unpublished experiments performed by 
ISIS Pharmaceuticals (908 oligonucleotides). Set 3 does not 
overlap with sets 1 or 2. Another test was to analyze the data 
with two groupings of the active oligos. One group contains 
oligos that decrease the level of mRNA or protein in cells to at 
least one-quarter of the control level and the other group 
combined oligos that decrease the level of mRNA or protein at 
least 2-fold. 

RESULTS AND DISCUSSION 

Correlation coefficients for motif occurrence and oligo activity 
were determined with the combined data sets from published 
experiments (sets 1 and 2). This analysis revealed several 
dozen motifs with significant correlation coefficient values. 
The list of motifs identified with Oligostat after all procedures 
including logistic regression, likelihood ratio test and 'minus 
one mRNA' test (Materials and Methods) is much shorter and 
includes only 10 triplet plus quadruplet motifs. Nine of these 
10 motifs (all except CCGG) were also confirmed to be positive 
or negative predictors of oligo activity in additional testing 
using a database of 908 oligonucleotides (set 3 with Isis 
Pharmaceutical's unpublished data) (Table 1). It is noted that 
all motifs with positive correlation coefficients are C rich. 

It is not completely clear why the presence of certain motifs 
correlates positively with oligonucleotide antisense activity. 
One consideration is that pyrimidine-rich oligonucleotides, 



especially those that are C-rich, are able to form the most stable 
DNA-RNA duplexes (20). Another is that mammalian RNase 
H, which is responsible for the antisense effect of phosphoro- 
thioate-modified oligonucleotides, may have some sequence 
preferences that could contribute to the observed bias. Prefer- 
ential binding of RNase H to the A-form of heteroduplexes 
might be responsible for some cleavage specificity of human 
RNase H (21) because pyrimidine-rich sequences of antisense 
oligonucleotides should form an A-form duplex with RNA 
target sites (22). Finally, some motifs may be required for more 
efficient cellular uptake of oligonucleotides. The reasons for 
the negative correlation of some motifs are also unknown. It is 
likely that the motif 'GGGG' can promote self-interacting 
structures in oligonucleotides that would make them less 
available for interaction with target mRNA. Irrespective of the 
underlying mechanism involved, the presence of a motif 
identified above correlates with an increase, or decrease, of the 
proportion of active oligos in nearly all of each set of molecules 
(Fig. 1). As described in Materials and Methods, three subsets of 
the data were analyzed and two groupings of activity values 
were employed. A consistent effect was generally seen with 
each subdivision, though, not surprisingly because of the 
combination of data from different origins utilizing different 
assays and concentrations, some variation is also evident. One 
exception is the motif CCGG that was identified as the 
negative predictor of activity in two data subdivisions (sets 1 
and 2). However, this finding was not substantiated in the third 
subdivision (set 3, Fig. IB). 



Table 1. List of motifs whose presence is 
correlated with antisense oligonucleotide activity 



Motif 


Correlation 


Significance 




Coefficient 




COAO 


03 


2.6 £-09 


recc 


&3 




ACTC 


0.2 


4.7E-05 


GCCA 


02 


0.0015 


CTCT 


0.1 


0.007 


mm 




4.7 £-08 


ACTG 


-as 


aooos 


TAA 


•0.2 


0.002 


CCGG 


•on 


0.02 


AAA 


•0.1 


0.03 



Motifs in red arc 'positive', their presence posi- 
tively correlated with antisense oligonucleotide 
activity. Motifs in green are 'negative*, their 
presence negatively correlated with activity. 



It is seen from Figure 2 that the proportion of active molecules 
is higher for the group of oligos with several 'positive* motifs in 
comparison with the group of oligos with only one 'positive' 
motif. Combination of several 'positive' motifs in an oligo- 
nucleotide may be beneficial. The subset of oligonucleotides in 
the database with more than one 'positive' motif is not big 
enough for carefui statistical analysis to address the question of 
whether overlapping or non-overlapping motifs are better for 
antisense activity. 

In conclusion, the activity of antisense oligonucleotides is 
correlated with certain sequence motifs. Understanding the 
reason for this correlation will require much further work, but 
its existence can be used for identification of mRNA sites that 
are most susceptible for efficient antisense targeting. 
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Figure 1. Positive (A) and negative (B) correlation of oligonucleotides antisense activity with the presence of some sequence motifs. Oligonucleotides from the 
database were categorized into groups according to the presence, or absence, of motifs that positively (A) or negatively (B) correlated with antisense activity. Red 
(dotted) columns represent oligonucleotides with 'positive' motifs (+). Green (hatched) columns represent oligonucleotides with 'negative* motifs (+) and blue 
columns represent oligonucleotides without the specified motif (-). Data in set 1 arc from Isis Pharmaceutical's published work, data in set 2 are from the published 
work of other investigators and data in set 3 are from unpublished work of Isis Pharmaceutical. 
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Figure 2. Correlation of oligonucleotide antisense activity with the number of positive sequence motifs. Red (spotted) columns show groups of oligos with motifs 
that are positively correlated with activity. Blue columns show combined groups of oligos with 'negative' or without 'positive* motifs. Data sets are described in 
Figure 1. 
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